Slovak Morphosyntactic Tagset

نویسندگان

  • Radovan Garabík
  • Mária Simková
چکیده

Morphological annotation constitutes essential, very useful and very common linguistic information presented in corpora, especially for highly inflectional languages. The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable, without sacrificing their informational contents. The tags consist of ASCII letters, numbers and several other characters. In general, they have a variable number of symbols, but their order is obligatory, and each category or specific feature is assigned a particular character, which can be shared among several parts of speech. The tagset is highly functional and pragmatic, although some allowances had to be made to accommodate the traditional analysis of Slovak morphology and part of speech categories.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Development of a Morphosyntactic Tagset for Afrikaans and its Use with Statistical Tagging

In this paper, we present a morphosyntactic tagset for Afrikaans based on the guidelines developed by the Expert Advisory Group on Language Engineering Standards (EAGLES). We compare our slim yet expressive tagset, MAATS (Morphosyntactic AfrikAans TagSet), with an existing one which primarily focuses on a detailed morphosyntactic and semantic description of word forms. MAATS will primarily be u...

متن کامل

Morphological Analysis of the Slovak National Corpus

1. Basis of a morphological analysis of the Slovak National Corpus A question of morphological (or morphosyntactic) analysis has been a key problem for natural language processing (NLP) for several years. Automatic morphological annotation is a useful tool especially with regard to the corpus data processing. In this respect morphological annotation has been considered also during the developme...

متن کامل

Designing and Evaluating a Russian Tagset

This paper reports the principles behind designing a tagset to cover Russian morphosyntactic phenomena, modifications of the core tagset, and its evaluation. The tagset and associated morphosyntactic specifications are based on the MULTEXT-East framework, while the decisions in designing it were aimed at achieving a balance between parameters important for linguists and the possibility to detec...

متن کامل

Combining Ontologies and Neural Networks for Analyzing Historical Language Varieties. A Case Study in Middle Low German

In this paper, we describe experiments on the morphosyntactic annotation of historical language varieties for the example of Middle Low German (MLG), the official language of the German Hanse during the Middle Ages and a dominant language around the Baltic Sea by the time. To our best knowledge, this is the first experiment in automatically producing morphosyntactic annotations for Middle Low G...

متن کامل

Towards a reference tagset for Japanese

This is a progress report on ongoing research aimed at proposing a ‘reference’ morphosyntactic part-of-speech tagset for the Japanese language. Such a tagset should be linguistically motivated, explicit, broadly applicable, and computationally tractable. Being well defined, such a tagset should be easily adapted in specific ways (e.g. limited, extended or modified). The author is currently atte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Language Modelling

دوره 0  شماره 

صفحات  -

تاریخ انتشار 2012